Re-ordering Source Sentences for SMT
نویسندگان
چکیده
We propose a pre-processing stage for Statistical Machine Translation (SMT) systems where the words of the source sentence are re-ordered as per the syntax of the target language prior to the alignment process, so that the alignment found by the statistical system is improved. We take a dependency parse of the source sentence and linearize it as per the syntax of the target language, before it is used in either the training or the decoding phase. During this linearization, the ordering decisions among dependency nodes having a common parent are done based on two aspects: parent-child positioning and relation priority. To make the linearization process rule-driven, we assume that the relative word order of a dependency relation’s relata does not depend either on the semantic properties of the relata or on the rest of the expression. We also assume that the relative word order of various relations sharing a relata does not depend on the rest of the expression. We experiment with a publicly available English-Hindi parallel corpus and show that our scheme improves the BLEU score.
منابع مشابه
Improving a Statistical MT System with Automatically Learned Rewrite Patterns
• Limitation of current phrase-based SMT • No mechanism for expressing and using linguistic phrases in reordering • Ordering of target words do not respect linguistic phrase boundaries • Xia and McCord’s solution: • Extract linguistic rewrite rules from corpora • Preprocess source sentences so phrase ordering is similar to that of target language • Perform SMT decoding with monotonic ordering c...
متن کاملWord re-ordering and dynamic programming based search algorithm for statistical machine translation
In this work, a new search procedure for statistical machine translation (SMT) is proposed that is based on dynamic programming (DP). The starting point is a DP solution to the traveling salesman problem that works by jointly processing tours that visit the same subset of cities. For SMT, the cities correspond to source sentence positions to be translated. Imposing restrictions on the order in ...
متن کاملPost-ordering in Statistical Machine Translation
In the field of staistical machine translation (SMT), pre-ordering is a recently attractive approach that reorders source language words into the target language order prior to SMT decoding. It is effective for long-distance reordering in SMT, especially between languages with distant word ordering like English and Japanese. Its key idea is to decompose the SMT problem into two subproblems of t...
متن کاملPatent Claim Translation based on Sublanguage-specific Sentence Structure
Patent claim sentences, despite their legal importance in patent documents, still pose difficulties for state-of-the-art statistical machine translation (SMT) systems owing to their extreme lengths and their special sentence structure. This paper describes a method for improving the translation quality of claim sentences, by taking into account the features specific to the claim sublanguage. Ou...
متن کاملThe Impact of Source–Side Syntactic Reordering on Hierarchical Phrase-based SMT
Syntactic reordering has been demonstrated to be helpful and effective for handling different word orders between source and target languages in SMT. However, in terms of hierarchial PB-SMT (HPB), does the syntactic reordering still has a significant impact on its performance? This paper introduces a reordering approach which explores the { (DE) grammatical structure in Chinese. We employ the S...
متن کامل